Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Big data correlation mining algorithm based on factorial design
TANG Xiaochuan, LUO Liang
Journal of Computer Applications    2018, 38 (9): 2507-2510.   DOI: 10.11772/j.issn.1001-9081.2018020460
Abstract658)      PDF (636KB)(357)       Save
Focused on the issue of dimensionality reduction in high-dimensional big data, a feature selection algorithm based on statistical factorial design was proposed, which was named Full Factorial Design (FFD). Firstly, the factor effect of the factorial design was used to measure the correlation between features and the target variable; secondly, a divide-and-conquer algorithm for finding the optimal factorial design for a given dataset was proposed; thirdly, in order to solve the problem that the traditional experimental design required manual execution of experiments, a data-driven approach was proposed to automatically search the response values for the factorial design from the input dataset; finally, the factor effects were calculated based on the design matrix and the average response values, and the features and interactions were sorted by the factor effects. Then the significant features and interactions could be obtained. The experimental results show that the average classification error rate of FFD over Mutual Information Maximisation (MIM), Joint Mutual Information Maximisation (JMIM) and ReliefF was 2.95, 3.33 and 6.62 percentage points, respectively. Therefore, FFD can effectively identify significant features and interactions that are highly correlated with the target variable in real-world datasets.
Reference | Related Articles | Metrics
Interaction based algorithm for feature selection in text categorization
TANG Xiaochuan, QIU Xiwei, LUO Liang
Journal of Computer Applications    2018, 38 (7): 1857-1861.   DOI: 10.11772/j.issn.1001-9081.2018010114
Abstract597)      PDF (752KB)(310)       Save
Focusing on the issue of feature selection in text categorization, an interaction maximum feature selection algorithm, called Max-Interaction, was proposed. Firstly, an information theoretic feature selection model was established based on Joint Mutual Information (JMI). Secondly, the assumptions of the existing feature selection algorithms were relaxed, and the feature selection problem was transformed into an interaction optimization problem. Thirdly, the maximum of the minimum method was employed to avoid the overestimation of higher-order interaction. Finally, a text categorization feature selection algorithm based on sequential forward search and high-order interaction was proposed. In the comparison experiments, the average classification accuracy of Max-Interaction over Interaction Weight Feature Selection (IWFS) was improved by 5.5%; the average classification accuracy of Max-Interaction over Chi-square was improved by 6%; and Max-Interaction outperformed other methods on 93% of the experiments. Therefore, Max-Interaction can effectively improve the performance of feature selection in text categorization.
Reference | Related Articles | Metrics